Named Entity Recognition for Vietnamese

نویسندگان

  • Dat Ba Nguyen
  • Son Huu Hoang
  • Son Bao Pham
  • Thai Phuong Nguyen
چکیده

Named Entity Recognition is an important task but is still relatively new for Vietnamese. It is partly due to the lack of a large annotated corpus. In this paper, we present a systematic approach in building a named entity annotated corpus while at the same time building rules to recognize Vietnamese named entities. The resulting open source system achieves an F-measure of 83%, which is better compared to existing Vietnamese NER systems. © 2010 Springer-Verlag Berlin Heidelberg. Index

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Importance of Automatic Syntactic Features in Vietnamese Named Entity Recognition

This paper presents a state-of-the-art system for Vietnamese Named Entity Recognition (NER). By incorporating automatic syntactic features with word embeddings as input for bidirectional Long Short-Term Memory (BiLSTM), our system, although simpler than some deep learning architectures, achieves a much better result for Vietnamese NER. The proposed method achieves an overall F1 score of 92.05% ...

متن کامل

Building English-Vietnamese Named Entity Corpus with Aligned Bilingual News Articles

Named entity recognition aims to classify words in a document into pre-defined target entity classes. It is now considered to be fundamental for many natural language processing tasks such as information retrieval, machine translation, information extraction and question answering. This paper presents a workflow to build an English-Vietnamese named entity corpus from an aligned bilingual corpus...

متن کامل

Vietnamese Named Entity Recognition using Token Regular Expressions and Bidirectional Inference

This paper describes an efficient approach to improve the accuracy of a named entity recognition system for Vietnamese. The approach combines regular expressions over tokens and a bidirectional inference method in a sequence labelling model, which achieves an overall F1 score of 89.66% on a test set of an evaluation compaign, organized in late 2016 by the Vietnamese Language and Speech Processi...

متن کامل

Construction of a Vietnamese Corpora for Named Entity Recognition

In order to build an automatic named entity recognition (NER) system using a machine learning approach, a large tagged corpus is widely seen as one necessary knowledge resource. Nevertheless, manual construction is time consuming, labor intensive and expensive. Building NER corpora for European languages has been extensively studied while some less-studied languages such as Vietnamese have not ...

متن کامل

End-to-End Recurrent Neural Network Models for Vietnamese Named Entity Recognition: Word-Level Vs. Character-Level

This paper demonstrates end-to-end neural network architectures for Vietnamese named entity recognition. Our best model is the combination of bidirectional Long ShortTerm Memory (Bi-LSTM), Convolutional Neural Network (CNN), Conditional Random Field (CRF), using pre-trained word embeddings as input, which achieves an F1 score of 88.59% on a standard test set. Our system is able to achieve a com...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010